Release 5.5.0: embeddings tooling, image bytes pipeline, and search/ACL hardening#17
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
Release 5.5.0 of parse-stack-next, adding new embeddings/image-bytes functionality and multiple security hardening and correctness fixes across vector search, retrieval, ACL aggregation behavior, and webhook afterSave callback dispatch.
Changes:
- Adds SDK-side image bytes fetch (
embed_image source: :bytes) with magic-byte MIME verification, allowlists, and EXIF/XMP stripping; updates Cohere/Voyage image providers accordingly. - Introduces embedding ops tooling:
BatchEmbedder, opt-in query-embed cache (Parse::Embeddings::Cache+ Moneta adapter), and expanded spend-cap controls (query charging +warn_at). - Adds vectorSearch index drift verification, hybrid-search hardening, retrieval pointer-filter translation, and several ACL/aggregation + webhook routing fixes; updates docs/tests and bumps version to 5.5.0.
Reviewed changes
Copilot reviewed 50 out of 51 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| test/lib/parse/webhook_triggers_test.rb | Adjusts tests for afterSave callback chaining behavior |
| test/lib/parse/webhook_aftersave_payload_fidelity_test.rb | Drives afterSave chain explicitly in lifecycle dispatch helper |
| test/lib/parse/verify_password_rate_limit_test.rb | Adds rate-limit parity tests for verify_password |
| test/lib/parse/vector_search_hybrid_security_test.rb | Adds hybrid-search security regression tests |
| test/lib/parse/vector_index_drift_test.rb | Adds drift verification policy + findings tests |
| test/lib/parse/search_index_migrator_tenant_filter_test.rb | Tests auto-added tenant filter path in vectorSearch declarations |
| test/lib/parse/retrieval_pointer_filter_test.rb | Adds pointer-value translation tests for retrieval filters |
| test/lib/parse/regex_unicode_option_unit_test.rb | Tests opt-in unicode regex options compilation |
| test/lib/parse/query/hint_mongo_direct_integration_test.rb | Adds Mongo-direct integration test for query hints |
| test/lib/parse/query/constraints/acl_query_constraints_test.rb | Updates ACL constraint tests for new alias/semantics |
| test/lib/parse/embeddings_voyage_image_test.rb | Updates Voyage image input validation error expectations |
| test/lib/parse/embeddings_spend_cap_query_test.rb | Adds query spend-cap coverage + warn_at tests |
| test/lib/parse/embeddings_image_fetch_test.rb | Adds tests for ImageFetch sniff/verify/strip/fetch pipeline |
| test/lib/parse/embeddings_cohere_image_test.rb | Updates Cohere image input validation error expectations |
| test/lib/parse/embeddings_cache_test.rb | Adds tests for embedding cache + Moneta adapter |
| test/lib/parse/embeddings_batch_embedder_test.rb | Adds tests for batch slicing/pacing/backoff behavior |
| test/lib/parse/embed_managed_meta_reembed_test.rb | Tests <into>_meta, reembed!, and bytes-mode embed_image |
| test/lib/parse/cloud_result_decode_test.rb | Adds cloud decode sessionToken preservation tests |
| test/lib/parse/cloud_functions_module_test.rb | Adds raw: behavior tests for cloud function calls |
| test/lib/parse/aggregation_auto_promotion_test.rb | Updates scoped aggregation fail-closed behavior tests |
| test/lib/parse/agent/mcp_resource_subscriptions_test.rb | Adds authorization gate parity tests for subscriptions |
| test/lib/parse/acl_constraints_unit_test.rb | Expands ACL aggregation routing + fail-closed regression tests |
| README.md | Documents 5.5 features and updates capability notes |
| lib/parse/webhooks/payload.rb | Fixes ruby-initiated memoization behavior |
| lib/parse/webhooks.rb | Moves afterSave callback chain to once-per-delivery path + adds safety |
| lib/parse/vector_search/hybrid.rb | Hardens probe classification + recomputes visible-order hybrid scores |
| lib/parse/vector_search.rb | Adds index drift policy config surface |
| lib/parse/stack/version.rb | Bumps version to 5.5.0 |
| lib/parse/schema/search_index_migrator.rb | Auto-augments vectorSearch declarations with tenant filter path |
| lib/parse/retrieval/retriever.rb | Adds pointer-value translation for retrieval filters |
| lib/parse/retrieval/agent_tool.rb | Wraps retrieval in spend-cap precharged scope |
| lib/parse/query/constraint.rb | Adds helper to parse { value:, unicode: true } regex option form |
| lib/parse/model/core/vector_searchable.rb | Adds spend-cap charging, cache hook, and index drift verification |
| lib/parse/model/core/embed_managed.rb | Adds bytes-mode embed_image, provenance meta, and reembed! |
| lib/parse/model/acl.rb | Adds include_missing toggle and strict predicate shaping |
| lib/parse/embeddings/voyage.rb | Supports FetchedImage inputs + base64 rows for Voyage |
| lib/parse/embeddings/spend_cap.rb | Adds warn_at, with_precharged, and query charging |
| lib/parse/embeddings/provider.rb | Updates provider contract for image sources (URL or FetchedImage) |
| lib/parse/embeddings/image_fetch.rb | Adds ImageFetch (sniff/verify/strip/fetch + FetchedImage) |
| lib/parse/embeddings/cohere.rb | Supports FetchedImage inputs + data-URI forwarding for Cohere |
| lib/parse/embeddings/cache.rb | Adds opt-in query-embed cache + Moneta adapter |
| lib/parse/embeddings/batch_embedder.rb | Adds batch-level embed orchestration with pacing/backoff |
| lib/parse/embeddings.rb | Wires new embeddings components + allowed_image_types + fetch-mode validation |
| lib/parse/client.rb | Documents trusted cloud decode behavior + raw guidance |
| lib/parse/api/users.rb | Adds shared login rate-limit to verify_password |
| Gemfile.lock | Updates gem version to 5.5.0 |
| docs/atlas_vector_search_guide.md | Documents drift verification, caching/spend caps, bytes-mode embedding, reembed tooling |
| CHANGELOG.md | Adds 5.5.0 release notes |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+143
to
+146
| io = Parse::File.safe_open_url(canonical) | ||
| bytes = io.read | ||
| bytes = bytes.to_s.dup.force_encoding(Encoding::BINARY) | ||
| io.close if io.respond_to?(:close) |
Comment on lines
+119
to
+127
| def index_drift_policy=(value) | ||
| v = value.to_sym | ||
| unless INDEX_DRIFT_POLICIES.include?(v) | ||
| raise ArgumentError, | ||
| "Parse::VectorSearch.index_drift_policy must be one of " \ | ||
| "#{INDEX_DRIFT_POLICIES.inspect} (got #{value.inspect})." | ||
| end | ||
| @index_drift_policy = v | ||
| end |
cccbdb0 to
aae716f
Compare
Bumps the SDK to 5.5.0 and adds a major set of embedding, image, caching, migration, and hardening features. Key changes:
- Multimodal image bytes path: SDK-side image download via Parse::Embeddings::ImageFetch with magic-byte MIME sniffing, URL-extension cross-check, and configurable allowed_image_types; EXIF/XMP stripping is on by default; embed_image now supports source: :bytes and FetchedImage objects to avoid provider-side fetches.
- Bulk embedding & resilience: Parse::Embeddings::BatchEmbedder adds batch slicing, inter-batch pacing, exponential backoff with jitter, and a BatchFailed error for resumable jobs.
- Query-embed cache: Parse::Embeddings::Cache (opt-in, LRU+TTL) with MonetaStore adapter for persistent L2 sharing and a hashed keyspace to avoid plaintext queries landing in stores; cache hits emit existing embed notifications with cached: true.
- Spend-cap improvements: SpendCap now covers all query-embed paths (direct callers included), supports warn_at soft-cap notifications, and provides tooling to avoid double-billing for agent tools.
- Embedding provenance & migrations: auto-declared <into>_meta object with {provider,model,dimensions,modality,embedded_at}; Class.reembed! for resumable bulk re-embeds; guidance for same-shape vs changed-width migrations and dual-field workflow.
- Vector index drift detection: first-query verification of Atlas vectorSearch index numDimensions/similarity and tenant-scope coverage with configurable Parse::VectorSearch.index_drift_policy (:warn/:raise/:ignore).
- Retrieval & filter hardening: pointer-value translation into MongoDB storage form for pointer-valued filters; various ACL/aggregation fixes and stricter/strict: options for permission constraints; aggregation terminals now route via mongo-direct when necessary and fail-closed when scoped and direct is unavailable.
- Hybrid search & ACL fixes: rankFusion score recomputation for scoped callers, probe error-class narrowing, and multiple webhook after_save callback hardening (single-run semantics and swallowed callback errors where appropriate).
- Client ergonomics & docs: README, changelog and Atlas vector search guide updated with new features, examples, and operator notes; numerous tests added/updated for embeddings, image fetch, cache, batch embedder, vector drift, retrieval filters and webhook behavior.
Overall this changeset hardens embedding/image handling (PII protections and MIME-laundering prevention), adds operational tooling for bulk re-embedding and caching, and tightens vector-search / ACL correctness and safety.
aae716f to
17b4e47
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Release 5.5.0: embeddings tooling, image bytes pipeline, and search/ACL hardening
Bumps
parse-stack-nextto 5.5.0. This release adds an SDK-side image bytes pipeline for image embeddings, bulk embedding operations tooling, and a set of security-hardening and correctness fixes across vector search, retrieval, ACL aggregation routing, and webhook dispatch.Breaking changes
:ACL.writeable_byoperator now resolves to the same public-inclusive, role-expanding implementation as:ACL.writable_by. Previously the one-letter spelling difference silently selected a separate strict, non-role-expanding constraint, so the two spellings produced different result sets. Code that relied on the old strict behavior ofwriteable_byshould passstrict: trueor use the:writable_by_exactoperator.readable_by/writable_bypermission array (or an unsupported Symbol) now raisesArgumentErrorinstead of being silently dropped, which weakened the intended filter.New
embed_image ..., source: :bytesfetches image bytes SDK-side throughParse::Embeddings::ImageFetchwith magic-byte MIME verification (no header/extension fallthrough), a deny-by-defaultallowed_image_hostsallowlist, configurableallowed_image_types, and EXIF/XMP metadata stripping (on by default viaexif_strip).validate_image_url!(mode: :fetch)validates URLs for SDK-side fetch without requiring the provider-egress sentinel.Parse::Embeddings::BatchEmbedder: bulk embedding with rate pacing (requests_per_minute:), retry with backoff (max_attempts:,retry_on:), progress callbacks, andBatchFailederrors that reportbatch_index/completed_countfor resumability.Parse::Embeddings::Cache.enable!) with LRU + TTL semantics, a fail-openMonetaStoreadapter for shared backends, and hit/miss instrumentation.warn_at:soft-cap threshold with a one-shot warning event per crossing, plus query-side charging (charge_query!,with_precharged).reembed!and embedding provenance: model-aware re-embedding withonly_stale:filtering, and an auto-declared<into>_metaobject field recording provider/model/dimensions for each managed vector.Parse::VectorSearch.index_drift_policy(:warn/:raise/:ignore)._p_<field>/Class$objectId) before tenant-scope folding.{ value: /.../, unicode: true }constraint form compiles to$options: "iu"without changing default regex behavior.Fixed / hardened
acl_user, oracl_roleraiseMongoDirectRequiredinstead of silently running over the master-key-only REST/aggregateendpoint when mongo-direct is unavailable._hybrid_scorefrom post-ACL visible order, closing a membership-inference side channel, and no longer caches authorization errors as "rankFusion unsupported".*_exact) variants, correctednot_readable_by/not_writable_bysemantics for missing-ACL documents, empty-intent (readable_by([])) matching, and role self-inclusion in role expansion (an unpersisted role no longer raises "no valid permissions").after_savechain now runs exactly once per delivery (fixes class-route + wildcard double-fire), with per-phase error isolation and correctedruby_initiated?memoization.verify_passwordnow shares theloginrate-limit bucket, closing a credential-probing bypass.Docs
docs/atlas_vector_search_guide.mdupdated for all new APIs; CHANGELOG updated for 5.5.0.